Classification Comparison of Prediction of Solvent Accessibility From Protein Sequences

نویسندگان

  • Huiling Chen
  • Huan-Xiang Zhou
  • Xiaohua Hu
  • Illhoi Yoo
چکیده

The prediction of residue solvent accessibility from protein sequences has been studied by various methods. The direct comparison of these methods is impossible due to the variety of datasets used and the difference in structure definition. In this paper we choose 5 classification approaches (decision tree (DT), Support Vector Machine (SVM), Bayesian Statistics (BS) , Neural Network (NN) and Multiple Linear Regression (MLR)) for predicting solvent accessibility based on the same dataset and using the same structure definition so that we can directly compare different methods. We evaluate these methods in a cross-validation test on 2148 unique proteins using single sequences and multiple sequences approaches with a cutoff of 20% for two-state definition of solvent accessibility. According to the experiment results, SVM and NN are both the best predictors with accuracy 79%, correlation coefficient 0.59, 2~4% superior to other three methods on multiple sequences prediction. A further test result on a blind test set from Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP5) is consistent with this result. On single sequence prediction, DT, BS and MLR perform about the same at 71~72% with correlation coefficient 0.43. The improvement over the baseline model that use only the identity of target residue is small. Local sequence seems embed very little information on accessibility. Separate training according to protein size improves the prediction when there are sufficiently large dataset available. The consensus prediction combining the 5 approaches is not significantly better than the best single method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining sequence and structural profiles for protein solvent accessibility prediction.

Solvent accessibility is an important structural feature for a protein. We propose a new method for solvent accessibility prediction that uses known structure and sequence information more efficiently. We first estimate the relative solvent accessibility of the query protein using fuzzy mean operator from the solvent accessibilities of known structure fragments that have similar sequences to th...

متن کامل

Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features

Solvent-accessible surface areas of residues in proteins are key factors in protein folding. Predicting solvent accessibility from protein sequences is significant for modeling the structural and functional characteristics of many proteins. In this work, we introduce an approach of enhancing solvent accessibility prediction accuracy. We derive pseudo-potentials, by considering high-orderinter-r...

متن کامل

Prediction of Protein Relative Solvent Accessibility with Support Vector Machines and Long-range Interaction

The prediction of protein relative solvent accessibility gives us helpful information for the prediction of tertiary structure of a protein. The SVMpsi method which uses support vector machines (SVMs) and the position specific scoring matrix (PSSM) generated from PSI-BLAST has been applied to achieve better prediction accuracy of the relative solvent accessibility. We have introduced a three di...

متن کامل

Prediction of relative solvent accessibility by support vector regression and best-first method

Since, it is believed that the native structure of most proteins is defined by their sequences, utilizing data mining methods to extract hidden knowledge from protein sequences, are unavoidable. A major difficulty in mining bioinformatics data is due to the size of the datasets which contain frequently large numbers of variables. In this study, a two-step procedure for prediction of relative so...

متن کامل

Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor.

The prediction of protein relative solvent accessibility gives us helpful information for the prediction of tertiary structure of a protein. The SVMpsi method, which uses support vector machines (SVMs), and the position-specific scoring matrix (PSSM) generated from PSI-BLAST have been applied to achieve better prediction accuracy of the relative solvent accessibility. We have introduced a three...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004